-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround resource loading crashes due to buggy TLS #78977
Workaround resource loading crashes due to buggy TLS #78977
Conversation
@@ -296,8 +296,10 @@ void ResourceLoader::_thread_load_function(void *p_userdata) { | |||
// Thread-safe either if it's the current thread or a brand new one. | |||
CallQueue *mq_override = nullptr; | |||
if (load_nesting == 0) { | |||
load_paths_stack = memnew(Vector<String>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems to be the main potential behavior change here, where a new stack will be allocated each time this function is called.
For my peace of mind, can you confirm whether this function is only called once for each ResourceLoader instance, so it's equivalent to when the vector was a member of it? I.e. there's no risk of losing a previous load path when de-allocating this at the end of the function, compared to the previous behavior where it would subsist as class data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be allocated per outermost stack frame, per resource load thread. The overhead of that is not ideal, but it's minimal. In any case, the data block of the vector will have to be allocated and resized normally, so this is just adding one additional allocation to the mix.
Regarding the persistence of data, I could have added a DEV_ASSERT(load_paths_stack.size()) == 0
to this function before this PR. Therefore, there no loss of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, this is a shot in the dark, I would need a production-like build to be kind of sure it does the trick. The other idea of making TLS data compilation-unit local may be better. I will make another PR and hopefully you can make builds with both ideas?
As pointed out on RC by @bitsawer, this has the same pattern as a Windows + LTS + MinGW GCC/Clang issue observed in the past, such as #47368 fixed by @hpvb. Moving the |
Potential alternative (untested): diff --git a/core/io/resource_loader.cpp b/core/io/resource_loader.cpp
index 4e001efcf8..b6aff109a6 100644
--- a/core/io/resource_loader.cpp
+++ b/core/io/resource_loader.cpp
@@ -202,6 +202,10 @@ void ResourceFormatLoader::_bind_methods() {
///////////////////////////////////
+static thread_local int load_nesting = 0;
+static thread_local WorkerThreadPool::TaskID caller_task_id = 0;
+static thread_local Vector<String> load_paths_stack;
+
// This should be robust enough to be called redundantly without issues.
void ResourceLoader::LoadToken::clear() {
thread_load_mutex.lock();
@@ -401,6 +405,10 @@ Error ResourceLoader::load_threaded_request(const String &p_path, const String &
}
}
+bool ResourceLoader::is_within_load() {
+ return load_nesting > 0;
+}
+
Ref<Resource> ResourceLoader::load(const String &p_path, const String &p_type_hint, ResourceFormatLoader::CacheMode p_cache_mode, Error *r_error) {
if (r_error) {
*r_error = OK;
@@ -1168,10 +1176,6 @@ bool ResourceLoader::create_missing_resources_if_class_unavailable = false;
bool ResourceLoader::abort_on_missing_resource = true;
bool ResourceLoader::timestamp_on_load = false;
-thread_local int ResourceLoader::load_nesting = 0;
-thread_local WorkerThreadPool::TaskID ResourceLoader::caller_task_id = 0;
-thread_local Vector<String> ResourceLoader::load_paths_stack;
-
template <>
thread_local uint32_t SafeBinaryMutex<ResourceLoader::BINARY_MUTEX_TAG>::count = 0;
SafeBinaryMutex<ResourceLoader::BINARY_MUTEX_TAG> ResourceLoader::thread_load_mutex;
diff --git a/core/io/resource_loader.h b/core/io/resource_loader.h
index 592befb603..febd178eb3 100644
--- a/core/io/resource_loader.h
+++ b/core/io/resource_loader.h
@@ -180,9 +180,6 @@ private:
static void _thread_load_function(void *p_userdata);
- static thread_local int load_nesting;
- static thread_local WorkerThreadPool::TaskID caller_task_id;
- static thread_local Vector<String> load_paths_stack;
static SafeBinaryMutex<BINARY_MUTEX_TAG> thread_load_mutex;
static HashMap<String, ThreadLoadTask> thread_load_tasks;
static bool cleaning_tasks;
@@ -196,7 +193,7 @@ public:
static ThreadLoadStatus load_threaded_get_status(const String &p_path, float *r_progress = nullptr);
static Ref<Resource> load_threaded_get(const String &p_path, Error *r_error = nullptr);
- static bool is_within_load() { return load_nesting > 0; };
+ static bool is_within_load();
static Ref<Resource> load(const String &p_path, const String &p_type_hint = "", ResourceFormatLoader::CacheMode p_cache_mode = ResourceFormatLoader::CACHE_MODE_REUSE, Error *r_error = nullptr);
static bool exists(const String &p_path, const String &p_type_hint = ""); |
Here's a build for the original commit in this PR (41c0785): https://downloads.tuxfamily.org/godotengine/testing/Godot_v4.1-rc_pr78977_original_win64.x86_64.zip And here's a build for the proposed alternative moving the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did further testing with @RandomShaper on #core and confirmed that the original approach in this PR fixes the issue. The alternative I suggested didn't.
Thanks! |
if (mq_override) { | ||
memdelete(mq_override); | ||
} | ||
memdelete(load_paths_stack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good practice to set pointers to nullptr after calling memdelete to reduce risk of a dangling pointer being inadvertently referenced.
That said, I assume there must already be some sort of guarantee that these variables are not referenced outside of threaded load, so it's not something to change for 4.1 ... just an observation. (I'm not too familiar with this code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. At that point the thread is about to die and no one could access the variable anymore. That said, it is when assumptions fail to be true for whatever reason (code evolution, etc.) when one would have wished to have put guidelines like set-to-null-after-free in practice without further consideration.
My current diagnosis of the issue is that MinGW's TLS implementation is causing a double-free of the
load_paths_stack
vector, with the second attempt happening when the memory is already tampered and therefore it's not just as nice as seeing the allocation pointer is already null, which would cause no crash.Ideally, this would be a temporary fix.
Workarounds #78734.